Uniformize model processors (models with special arg names) #32841

leloykun · 2024-08-16T08:23:28Z

What does this PR do?

Uniformizes kwargs for processors of ClipSeq, Nougat, Owlv2, OwlVIT as discussed in Uniform kwargs for processors #31911
Adds backward compatibility for special call arguments passed as positional arguments. Special call args are arguments that carry data (e.g. negative prompt, segmentation images, etc.), but aren't [text, images, audio, videos] and not config values for the tokenizer, image processor, etc.

(arguments that carry data

TODO:

Fixes # (issue)

Uniform kwargs for processors #31911

Who can review?

@zucchini-nlp @molbap @NielsRogge

molbap

Thanks for the contribution! I think handling the text_pair etc kwargs is a bit more challenging, we should be able to use it solely with TypedDicts and without adding extra args

src/transformers/models/nougat/processing_nougat.py

molbap · 2024-08-16T09:19:27Z

src/transformers/models/nougat/processing_nougat.py

+        if output_kwargs["text_kwargs"].get("text_pair") is not None and audio is not None:
+            raise ValueError(
+                "You cannot provide `text_pair` as a positional argument and as a keyword argument at the same time."
+                "Please provide it only as a keyword argument (i.e. `text_pair=...`)."
+            )


Hm, it is not robust to rely on audio here - or at least we have to explicitly comment why we are doing that.

I think that's what we also do in UDOP. Maybe a comment explaining is enough as audio arg-name isn't expected in Nougat

yup, I just followed the code here: #32544 (comment)

without this, old code that passed the args as positional arguments would break

I'll add comments & warnings for now, but lemme guys know what you think is best

I commented on the other PR as well. I missed it for Udop but I don't think it's a good precedent unfortunately - it's a bit hacky as it stands, trying to see if we can do it some other way that's cleaner

Thanks for raising this! I have a new approach which I described in more detail here: #31911 (comment)

pls lemme know what you think abt it!

src/transformers/models/nougat/processing_nougat.py

…s here

zucchini-nlp

Thanks for this awesome work!

The workaround seems a bit hacky but I guess we need that for BC. Can we add tests for the newly added prepare_and_validate_optional_call_arg in the models that support extra args? And let's add a version when we'll stop handling BC for users, it can be until v4.47 which gives two major versions for users

zucchini-nlp · 2024-08-19T05:16:52Z

src/transformers/processing_utils.py

+            raise ValueError(
+                f"Expected *at most* {len(self.optional_call_args)} optional positional arguments in processor call but received {len(args)}."
+                "Passing positional arguments to the processor call is not recommended"
+            )


This message imo isn't very informative for users who have no idea what is happening under the hood. Maybe we could show optional_call_args names and say we couldn't map positional args to those

I've just changed the error message

@zucchini-nlp can you check if the new one's okay?

zucchini-nlp · 2024-08-19T05:22:04Z

src/transformers/models/clipseg/processing_clipseg.py

+        text: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None,
+        images: Optional[ImageInput] = None,


another thing we're doing now is swap the arg order, so that it is image, text, audio, videos. And that needs another deprecation cycle...

BTW, i am quite out of the loop, do we need this order-swapping for pipeline @yonigozlan ?

src/transformers/models/clipseg/processing_clipseg.py

tests/test_processing_common.py

zucchini-nlp · 2024-08-19T06:59:13Z

src/transformers/models/donut/image_processing_donut.py

@@ -232,6 +233,7 @@ def thumbnail(
                The channel dimension format of the input image. If not provided, it will be inferred.
        """
        input_height, input_width = get_image_size(image, channel_dim=input_data_format)
+        size = get_size_dict(size)


not clear why we needed these changes, was this causing CI failure?

these are utils for old processors that don't support the new image size format yet

we might as well add these here since (1) they help w/ backwards compatibility, (2) make the image-text-to-text pipeline easier to implement, & (3) they just revert to a no-op if size already follows the new image size format

zucchini-nlp · 2024-08-19T07:01:03Z

src/transformers/models/owlv2/processing_owlv2.py

+        text: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None,
+        images: Optional[ImageInput] = None,


Oh, and also, these two should be swapped so that the order is image, text, kwargs

zucchini-nlp · 2024-08-19T07:01:26Z

src/transformers/models/owlvit/processing_owlvit.py

+        text: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None,
+        images: Optional[ImageInput] = None,
+        # The following is to capture `visual_prompt` argument that may be passed as a positional argument.


Same here for swapping

yonigozlan · 2024-08-19T17:08:53Z

src/transformers/processing_utils.py

+    text_pair: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]]
+    text_target: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]
+    text_pair_target: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]]


Should these be added here and not be treated as model-specific args?

they're in the base tokenizer class so I figured it's okay to put them here

lemme know if I should remove them

Personally for me it's okey to have it here, as it is a general arg accepted by all tokenizers, though not used by all processors

Sounds good, that will remove the need for special args for Udop so that's great :)

leloykun · 2024-08-20T16:51:01Z

I'll also swap the text & images args in a separate PR

This PR should now be ready for review

yonigozlan · 2024-09-09T21:10:16Z

Hi @leloykun ! Pinging this as blocking requirements for this PR should all be completed soon! Notably the function to check that the images and text inputs are in the correct order is merged. Here's how it is used in LlaVa for example.

Also I think your test_processing_common in this PR is great, and generalizes better to most models than the current one. Would you mind opening a separate PR just for that to get it merged quickly so that other PRs can use it?
I'd say the same for processing_utils.
Thanks a lot!

yonigozlan · 2024-09-13T22:11:48Z

Hi again @leloykun , I opened a PR with some of your changes to processing_utils.py and test_processing_common.py here #33479 .
One notable change is in test_processing_common where I replaced the testing of images kwargs with size and crop_size to do_rescale and rescale_factor, as the fact that not all image_processors accept size or crop_size.

If you think you will have the bandwidth to work on this, feel free to open a PR and I'll close mine. In any case, thanks a lot for your contributions on this!

leloykun mentioned this pull request Aug 16, 2024

Uniform kwargs for processors #31911

Open

40 tasks

molbap reviewed Aug 16, 2024

View reviewed changes

This was referenced Aug 16, 2024

Uniformize model processors (models w/o special arg names) #32845

Open

Uniformize kwargs for image-text-to-text processors #32544

Merged

leloykun added 2 commits August 16, 2024 18:52

uniformize processor kwargs of nougat

508e1a4

add tests and more docs

257c690

leloykun force-pushed the fc--uniformize-nougat branch from 831e64c to 257c690 Compare August 16, 2024 10:56

add uniformization of processor kwargs of processors with special key…

93e7070

…s here

leloykun changed the title ~~Uniformize kwargs for Nougat processor~~ Uniformize model processors (models *with* special arg names) Aug 17, 2024

refactor how we handle arguments passed as positional args

0128d19

leloykun requested review from zucchini-nlp and molbap August 17, 2024 11:34

zucchini-nlp reviewed Aug 19, 2024

View reviewed changes

leloykun added 3 commits August 19, 2024 15:51

address @zucchini's comments

8c36cfb

fix docs

3a2f7ef

rm video testing

a280b3a

yonigozlan reviewed Aug 19, 2024

View reviewed changes

leloykun added 2 commits August 20, 2024 19:34

make processor call implementations simpler too

ca925cc

fix test for clipseg and add more tests for owl models

71a7ee1

leloykun requested review from zucchini-nlp and yonigozlan August 20, 2024 16:51

fix test for clipseg

274b615

leloykun mentioned this pull request Aug 21, 2024

Uniform kwargs for processors of audio-text models #32906

Draft

9 tasks

yonigozlan mentioned this pull request Sep 13, 2024

Add support for args to ProcessorMixin for backward compatibility #33479

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uniformize model processors (models with special arg names) #32841

Uniformize model processors (models with special arg names) #32841

leloykun commented Aug 16, 2024 •

edited

Loading

molbap left a comment

molbap Aug 16, 2024

zucchini-nlp Aug 16, 2024

leloykun Aug 16, 2024

molbap Aug 16, 2024

leloykun Aug 17, 2024

zucchini-nlp left a comment

zucchini-nlp Aug 19, 2024

leloykun Aug 19, 2024

zucchini-nlp Aug 19, 2024

zucchini-nlp Aug 19, 2024

leloykun Aug 19, 2024

zucchini-nlp Aug 19, 2024

zucchini-nlp Aug 19, 2024

yonigozlan Aug 19, 2024

leloykun Aug 19, 2024

zucchini-nlp Aug 20, 2024

yonigozlan Aug 20, 2024

leloykun commented Aug 20, 2024

yonigozlan commented Sep 9, 2024 •

edited

Loading

yonigozlan commented Sep 13, 2024

		text: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None,
		images: Optional[ImageInput] = None,

Uniformize model processors (models *with* special arg names) #32841

Are you sure you want to change the base?

Uniformize model processors (models *with* special arg names) #32841

Conversation

leloykun commented Aug 16, 2024 • edited Loading

What does this PR do?

Who can review?

molbap left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zucchini-nlp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leloykun commented Aug 20, 2024

yonigozlan commented Sep 9, 2024 • edited Loading

yonigozlan commented Sep 13, 2024

Uniformize model processors (models with special arg names) #32841

Uniformize model processors (models with special arg names) #32841

leloykun commented Aug 16, 2024 •

edited

Loading

yonigozlan commented Sep 9, 2024 •

edited

Loading